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Abstract 

Parameter estimation in dynamic systems finds applications in various disciplines, including system biology. The 
well-known expectation-maximization (EM) algorithm is a popular method and has been widely used to solve system 
identification and parameter estimation problems. However, the conventional EM algorithm cannot exploit the 
sparsity. On the other hand, in gene regulatory network inference problems, the parameters to be estimated often 
exhibit sparse structure. In this paper, a regularized expectation-maximization (rEM) algorithm for sparse parameter 
estimation in nonlinear dynamic systems is proposed that is based on the maximum a posteriori (MAP) estimation and 
can incorporate the sparse prior. The expectation step involves the forward Gaussian approximation filtering and the 
backward Gaussian approximation smoothing. The maximization step employs a re-weighted iterative thresholding 
method. The proposed algorithm is then applied to gene regulatory network inference. Results based on both 
synthetic and real data show the effectiveness of the proposed algorithm. 

Keywords: Nonlinear dynamic system; Parameter estimation; Sparsity; Expectation-maximization; Forward-backward 
recursion; Gaussian approximation; Gene regulatory network 



1 Introduction 

The dynamic system is a widely used modeling tool that 
finds applications in many engineering disciplines. Tech- 
niques for state estimation in dynamic systems have been 
well established. Recently, the problem of sparse state 
estimate has received significant interest. For example, 
various approaches to static sparse state estimation have 
been developed in [1-4], where the problem is essentially 
an underdetermined inverse problem, i.e., the number of 
measurements is small compared to the number of states. 
Extensions of these methods for dynamic sparse state 
estimation have been addressed in [5-7]. 

The expectation-maximization (EM) algorithm has also 
been applied to solve the sparse state estimate problem 
in dynamic systems [8-12]. In particular, in [8-10], the 
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EM algorithm is employed to update the parameters of 
the Bernoulli-Gaussian prior and the measurement noise. 
These parameters are then used in the generalized approx- 
imate message passing algorithm [8-10]. In [12,13], the 
EM algorithm is used to iteratively estimate the param- 
eters that describe the prior distribution and noise vari- 
ances. Moreover, in [14], the EM algorithm is used for 
blind identification, where the sparse state is explored. 
Note that in the above works, only linear dynamic systems 
are considered. 

In this paper, we focus on the sparse parameter esti- 
mation problem instead of the sparse state estimation 
problem. We consider a general nonlinear dynamic sys- 
tem, where both the state equation and the measurement 
equation are parameterized by some unknown parameters 
which are assumed to be sparse. One particular applica- 
tion is the inference of gene regulatory networks. The gene 
regulatory network can be modeled by the state-space 
model [15], in which the gene regulations are represented 
by the unknown parameters. The gene regulatory network 
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is known to be sparse due to the fact that a gene directly 
regulates or is regularized by a small number of genes 
[16-19]. The EM algorithm has been applied to parame- 
ter estimation in dynamic systems [20]. However, the EM 
algorithm cannot exploit the sparsity of the parameters. 
Here, we propose a regularized expectation-maximization 
(rEM) algorithm for sparse parameter estimation in non- 
linear dynamic systems. Specifically, the sparsity of the 
parameters is imposed by a Laplace prior and we consider 
the approximate maximum a posteriori (MAP) estimate 
of the parameters. It should be emphasized that the pro- 
posed method is an approximate MAP-EM algorithm 
based on various Gaussian assumptions and quadrature 
procedures for approximating Gaussian integrals. Note 
that the MAP-EM algorithm may get stuck at local min- 
ima or saddle points. Similar to the conventional EM 
algorithm, the rEM algorithm also consists of an expecta- 
tion step and a maximization step. The expectation step 
involves the forward Gaussian approximation filtering and 
the backward Gaussian approximation smoothing. The 
maximization step involves solving an £i minimization 
problem for which a re-weighted iterative thresholding 
algorithm is employed. To illustrate the proposed sparse 
parameter estimation method in dynamic systems, we 
consider the gene-regulatory network inference based on 
gene expression data. 

The unscented Kalman filter has been used in the infer- 
ence of gene regulatory network [15,21,22]. However, the 
methods proposed in [15,21,22] are fundamentally differ- 
ent with the method proposed in this paper. Firstly, the 
unscented Kalman filter is only used once in [15,21,22], 
while it is used in each iteration of the rEM algorithm in 
this paper. Secondly, not only the unscented Kalman filter 
but also the unscented Kalman smoother is used in our 
proposed rEM algorithm. In essence, only the observation 
before time k is used to the estimation at time k in the 
unscented Kalman filter. However, in our rEM algorithm, 
all observation data is used to the estimation at time k (by 
the unscented Kalman smoother). The fundamental dif- 
ference between the proposed work and that of [9] is that 
the proposed work is for the sparse parameter estimation 
problem of the dynamic system, while that of [9] is only for 
the sparse parameter estimation of the static problem. In 
addition, a general nonlinear dynamic system is involved 
in our work and only linear system is involved in the work 
of [9]. The main difference between the proposed work 
and that of [23] is that the sparsity constraint is enforced. 
The main contribution of this paper is to use the sparsity- 
enforced EM algorithm to solve the sparse parameter 
estimation problem. In addition, the reweighted iterative 
threshold algorithm is proposed to solve the ii optimiza- 
tion algorithm. To the best knowledge of the authors, 
the proposed rEM with the reweighted iterative thresh- 
old optimization algorithm is innovative. Furthermore, we 



have systematically investigated the performance of the 
proposed algorithm and compared the results with other 
conventional algorithms. 

The remainder of this paper is organized as follows. In 
Section 2, the problem of the sparse parameter estima- 
tion in dynamic systems is introduced and the regularized 
EM algorithm is formulated. In Section 3, the E-step 
of rEM that involves forward-backward recursions and 
Gaussian approximations is discussed. Section 4 discusses 
the li optimization problem involved in the maximization 
step. Application of the proposed rEM algorithm to gene 
regulatory network inference is discussed in Section 5. 
Concluding remarks are given in Section 6. 

2 Problem statement and the MAP-EM algorithm 

We consider a general discrete-time nonlinear dynamic 
system with unknown parameters, given by the following 
state and measurement equations: 

=f(Xk-lJ)+Uk, (1) 

and yi, = hiXkJ) + Vf,, (2) 

where xj^ and are the state vector and the observation 
vector at time /c, respectively; 0 is the unknown param- 
eter vector; /(•) and h(') are two nonlinear functions; 

~ Af(0, U k) is the process noise, and ^ A/'(0, R^) is 
the measurement noise. It is assumed that both {uj^} and 
{vk) are independent noise processes and they are mutu- 
ally independent. Note that the nonlinear functions/ and 
h are assumed to be differentiable. 

Define the notation 4[y^, . . . ,y^]. The problem 
considered in this paper is to estimate the unknown sys- 
tem parameter vector 0 from the length-/C measurement 
data r^. We assume that 0 is sparse. In particular, it has a 
Laplacian prior distribution which is commonly used as a 
sparse prior, 

m . 

i=i ^ 

In the EM algorithm and the MAP-EM algorithm [23], 
given an estimate a new estimate 0'' is given by 

(9'' = argmaxQ((9,(90, (4) 
e 

and 

e" = arg max [Q((9, 0') + logpiO)] , (5) 
0 

respectively. 

Note that the regularized EM can be viewed as a spe- 
cial MAP-EM. To differentiate the sparsity-enforced EM 
algorithm from the general MAP-EM algorithm, rEM is 
used. In this paper, the following assumptions are made. 
(1) The probability density function of the state is assumed 
to be Gaussian. The Bayesian filter is optimal; however, 



Jia and Wang EURASIP Journal on Bioinformatics and Systems Biology 2014, 2014:5 
http://bsb.eurasipjournals.eom/content/2014/1/5 



Page 3 of 1 5 



exact finite-dimensional solutions do not exist. Hence, 
numerical approximation has to be made. The Gaussian 
approximation is frequently assumed due to the relatively 
low complexity and high accuracy [24-26]. (2) The inte- 
grals are approximated by various quadrature methods. 
Many numerical rules, such as Gauss-Hermite quadra- 
ture [25], unscented transformation [27], cubature rule 
[24], and the sparse grid quadrature [26], as well as the 
Monte Carlo method [28], can be used to approximate 
the integral. However, the quadrature rule is the best 
when computational complexity and accuracy are both 
considered [29]. 

We next consider the expression of the Q-function in 
(5). Due to the Markovian structure of the state-space 
model (1) to (2), we have 

K K 
k=2 k=l 

(6) 

Therefore, 

=j\ogp{xi\e)p{xi\Y^,e')dxi 

+ ^/ \o%p{Xk\Xk-i,B) p{Xk,Xk-i\Y^ ,0')dxk-idxk 
k^2^ ^ ' ' 

1< r 

+ J2j \ogp(yk\Xk,0) pixk\Y^,0')6xk, 

- i (yk-h{Xk,e))TR-^ (y^_h(xk,0))-dk 

(7) 

where c/^ = ^[log|i//:| + dim(;Vy^) log(27r)] and dj^ = 
^[log \Ri(\ + dim(j^) log(27r)]. We assume that the initial 
state xi is independent of the parameter 0. Hence, with the 
prior given in (3), the optimization in (5) can be rewritten 
as 

e" = arg max [Q(0,e') + logpiO)] 

0 

= argmin ^|2c^+ f [(x^ -fixk-i,0f U'^Xk -fixk-i,0)] 

xp (Xk,Xk-i\Y^ ,e')dxk-idxk^ 

+ ^|24+ f [(yi,-h(xk,efR^'(y,,-h(xk,e)'\p{xk\Y'<,e')dxk\ 
+ 2||Xo0||i, 

(8) 

where X = [A.i, A.2, • • • , A.^]^, and 'o' denotes the point- 
wise multiplication. 

Note that in many applications, the unknown parame- 
ters 0 are only related to the state equation, but not to the 
measurement equation. Therefore, the second term in (8) 



can be removed. In the next section, we discuss the proce- 
dures for computing the densities p(Xk,x/(-i\Y^'^ ,0^) and 
p{Xk\Y^^ , 0'), the integrals, and the minimization in (8). 

3 The E-step: computing the Q-function 

We first discuss the calculation of the probability density 
functions of the states p{XkyXk-i\Y^'^ , 0') and p{Xk\Y^'^ , 0^) 
in (8), which involves a forward recursion of a point-based 
Gaussian approximation filter to compute p(Xk\Y^y 0^) and 
p{xk+i I Y^, 0'), k = 1, 2, /<C, and a backward recursion of 
a point-based Gaussian approximation smoother to com- 
pute p{Xk.Xk-i\Y^^.O') and p{Xk\Y^^ ,0'), k = KJ< - 
For notational simplicity, in the remainder of this 
section, we drop the parameter 0' . 

3.1 Forward recursion 

The forward recursion is composed of two steps: pre- 
diction and filtering. Specifically, given the prior proba- 
bility density function (PDF) p{xk-i\Y^^~^) at time k — 
1, we need to compute the predicted conditional PDF 
p{Xk\Y^~^)] then, given the measurement at time 
we update the filtered PDF p{xk\Y^)' These PDF recur- 
sions are in general computationally intractable unless the 
system is Unear and Gaussian. The Gaussian approxima- 
tion filters are based on the following two assumptions: 
(1) Given Y^~^, xj^-i has a Gaussian distribution, i.e., 
Xk-i\Y^~^ ~ M{xk-i\k-i^Pk-i\k-iy^ and (2) {xk.yj,) are 
jointly Gaussian, given Y^~^ , 

It then follows that the predictive PDF is Gaussian, i.e., 
Xk\Y^-^ - M{Xk\k-i.Pk\k-i\ with [24,26,27] 

Xk\k-i = nxk\Y^-^} = Vili^^-^ {f(xk-i)} > (9) 

jCfC^^-l) - Xk\k-i)(fiXk-i - Xk\k-l)^] + ^k-l> 

(10) 

where {^(^y^-i)} = f g(x)(l)(x; Xk-i\k-i> 

Pk-i\k-i)dXf and (t)(x;XyP) denotes the multivariate 
Gaussian PDF with mean x and covariance P. 

Moreover, the filtered PDF is also Gaussian, i.e., X/^IY^^ ^ 
J^(h\k.Pk\k) [24,26,27], where 

h\k = Mxk\y''} = h\k-i -^Lk(yf,-yf,\j,_i). 

(11) 

and Pk\k = Cov{^^|r^} = Pk\k-i - LkPf. (12) 
with 

h\k-i='^«,\Y^-^ih{xk)], (13) 
L,,^Pf(R,,+lfr\ (14) 
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(15) 

Pk = ^«,|y*-i [(f'(x,)-y,^,_,)(h(xk)-h\k-if] ■ 

(16) 

3.2 Backward recursion 

In the backward recursion, we compute the smoothed 
PDFs p(Xk,Xk-\-i\Y^'^) and p(Xk\Y^'^), Here, the approxi- 
mate assumption made is that conditioned on y^, x^ and 
Xk^i are jointly Gaussian [30], i.e.. 



Xk 



j I y^^n(^ 



^k\k 



Pk\k Q 

Pk-i-l\k 



(17) 



with Q = Cov{^^,^^+i|r^ 



(18) 



Due to the Markov property of the state-space model, 
we have p{xk\xk+i, Y^) = p(x/^\x/^j^i, Y^). Therefore, we 
can write [30] 



p(xky Xk+i I Y^^) = p(xk\xk+iy Y^^)p(xk-^i I Y^^) 
= p(xj,\xj,^i, Y^)p(xj,^i\Y^), 



(19) 



Now, assume that 

Xk+ilY'"^ ~ MiXk+i.h+i). with Xk = xk\k, Pr = Pk\k- 

(20) 

It then follows from (17) and (19) that [30] 

DkPk+i 
Pk+iDl Pk+i 



where 

Xk = Xl(\k + - %+l|A:)> 

Pk = Pk\k + Dkih+i - Pk+i\k)Dl, 
Dk = CkP^l^^,^. 



' Xk 


\Y^ -^^(^ 


~Xk 


1 


_ ^k-hl _ 




_ %+i _ 





(21) 



(22) 
(23) 
(24) 



3.3 Approximating the integrals 

The integrals associated with the expectations in the 
forward-backward recursions for computing the approx- 
imate state PDFs, i.e., (9), (10), (13), (15), (16), and 
(18), as well as the integrals involved in computing the 
function Q{OyO') in (8), are integrals of Gaussian type 



that can be efficiently approximated by various quadra- 
ture methods. Specifically, if a set of weighted points 
[{ypWi),i = 1, . . .,A/^} can be used to approximate the 
integral 



IEA/-(o,/)te(^)} = / g{x)(l) (x; 0,7) dx^Yl, ^ig^Vd^ 

^ i=i 



(25) 



then the general Gaussian-type integral can be approxi- 
mated by 



^EjV'(;t,P)te(^)} = j {x;x,P) dx^^ mgiSyi + x), 

(26) 



i=l 



where P = SS^ and 5^ can be obtained by Cholesky 
decomposition or singular value decomposition (SVD). 

By using different point sets, different Gaussian approx- 
imation filters and smoothers can be obtained, such as 
the Gauss-Hermite quadrature (GHQ) [25], the unscented 
transform (UT) [27], the spherical-radial cubature rule 
(CR) [24], the sparse grid quadrature rule (SGQ) [26], and 
the quasi Monte Carlo method (QMC) [28]. Both the UT 
and the CR are the third-degree numerical rules which 
means the integration can be exactly calculated when^(;v) 
is a polynomial with the degree up to three. In addition, 
the form of the CR is identical to the UT with a specific 
parameter. The main advantage of the UT and the CR is 
that the number of points required by the rule increases 
linearly with the dimension. However, one problem of the 
UT and the CR is that the high-order information of the 
nonlinear function is difficult to capture so that the accu- 
racy may be low when g(x) is a highly nonlinear function. 
The GHQ rule, in contrast, can capture arbitrary degree 
information of g{x) by using more points. It has been 
proven that GHQ can provide more accurate results than 
the UT or the CR [25,26]. Similarly, the QMC method 
can also obtain more accurate results than the UT. How- 
ever, both the GHQ rule and the QMC method require 
a large number of points for the high-dimensional prob- 
lem. Specifically, the number of points required by the 
GHQ rule increases exponentially with the dimension. To 
achieve a similar performance of the GHQ with a small 
number of points, the SGQ is proposed [26], where the 
number of points increases only polynomially with the 
dimension. 

For the numerical results in this paper, the UT is used in 
the Gaussian approximation filter and smoother, where we 
have N = 2n -\- If with n being the dimension of the state 
vector x^. The quadrature points and the corresponding 
weights are given, respectively, by 
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and 



0, 



- + K)ei- 



2,- 



Wi = 



1 



i= 1, 



/ = 2, • • • , 2fz + 1, 



i = h 

, 2« + 1, 
(27) 



(28) 



2(n + k) 

where /c is a tunable parameter, and ei is the ith n dimen- 
sional unit vector. Note that /c = 0 is used as the default 
value in this paper, as in the cubature Kalman filter [24]. 
In addition, k = —?> can also be used as in the unscented 
Kalman filter [27]. 

4 The M-step: solving the l\ optimization 
problem 

Solving the l\ optimization problems in (8) is not trivial 
since is nondifferentiable at Qi — 0. The l\ optimiza- 
tion is a useful tool to obtain sparse solutions. Methods for 
solving linear inverse problems with sparse constraints are 
reviewed in [1]. Some more recent developments include 
the projected scaled subgradient [31] method, the gradi- 
ent support pursuit method [32], and the greedy sparse- 
simplex method [33]. In this paper, for the maximization 
step in the proposed rEM algorithm, due to the simplicity 
of implementation, we will employ a modified version of 
the iterative thresholding algorithm. 

4.1 Iterative thresholding algorithm 

Denote Q(0,0^) as the two summation terms in (8). We 
consider the optimization problem in (8) 

arg min/((9) = Q(0,0') + 2\\X oO\\i. (29) 
0 

The solution to (29) can be iteratively obtained by 
solving a sequence of optimization problems [34]. As in 
the Newton's method, the Taylor series expansion of the 
Q(Oy 0') around the solution 0^ at the rth iteration is given 
by 

Q(0'+AOJ') = Q(0'J')+A0^WQ(0'J')+^\\A0\\1 

(30) 

where VQ is the gradient of the negative Q-function and 
at is such that atl mimics the Hessian W^Q, Then, 0^~^^ is 
given by 

(9^+^=arg min (z-0Y^Q(0\0')+—\\z-0^\\l+2\\Xoz\\u 
z 2 

(31) 

where z denotes the variable to be optimized in the objec- 
tive function. 



The equivalent form of (31) is given by 
z 

1 



argmind|z-M'||^ + -||Xoz||i, (32) 

z ^ 



with = 



J 



0^ — vQ{e\e'\ 

\\^t\\2 ' 



(33) 

(34) 

(35) 
(36) 



Note that Equation 34 is derived as follows. Because we 
require that atl mimics the Hessian V-^Q, i.e., ats^ ^ r^, 
solving at in the least-squares sense, we have 



at ^ arg mm \\as 



(37) 



V ' 

The solution to (32) is given by 0^^^ = r]^{u\ ^), where 

r]^{u\ a) = sign(MOmax {\u^\ - a, O} (38) 

is the soft thresholding function with sign(M^) and 
max { — a, O} being component-wise operators. 

Finally, the iterative procedure for solving (29) is given 
by 



= sign {d^ 



max 



ott 

1 

ott 



2X 



,oj. 



(39) 



And the iteration stops when the following condition is 
met: 



< 6, 



\K0')\ 

where 6 is a given small number. 



(40) 



4.2 Adaptive selection of X 

So far, the parameters Xi in the Laplace prior are fixed. 
Here, we propose to adaptively tune them based on 
the output of the iterative thresholding algorithm. The 
algorithm consists of solving a sequence of weighted €i- 
minimization problems. A/ used for the next iteration are 
computed from the value of the current solution. A good 
choice of Xi is to make them counteract the influence of 
the magnitude of the li penalty function [35]. Following 
this idea, we propose an iterative re-weighted threshold- 
ing algorithm. At the beginning of the maximization step, 
we setXi = 1, V/. Then, we run the iterative thresholding 



algorithm to obtain 0, Next, we update Xi as Xi 



,v/. 



where 6 is a small positive number, and run the iterative 
thresholding algorithm again using the new X. The above 
process is repeated until it converges at the point where 
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the maximization step completes. Note that for the iter- 
ative re-weighted thresholding algorithm, the assumption 
that 0 has a Laplacian prior no longer holds. 

5 Application to gene regulatory network 
inference 

The gene regulatory network can be described by a graph 
in which genes are viewed as nodes and edges depict 
causal relations between genes. By analyzing collected 
gene expression levels over a period of time, one can 
find some regulatory relations between different genes. 
Under the discrete-time state-space modeling, for a gene 
regulatory network with n genes, the state vector Xk = 
[^i,kf ' ' -y^n^kY'' y where Xi^ky denotes the gene expression 
level of the /th gene at time k. 

In this case, the nonlinear function in the general 
dynamic Equation (1) is given by [15] 



fiXk-iJ) =Ag(xk-i), 



with 



and 



1 + e- 



V/= I,-. - 



(41) 



(42) 



(43) 



In (41), >l is an « X « regulatory coefficient matrix with 
the element Uij denoting the regulation coefficient from 
gene J to gene /. A positive coefficient Uij indicates that 
gene / activates gene /, and a negative Oij indicates that gene 
/ represses gene /. The parameter to be estimated is ^ = ^ 
which is sparse. 

For the measurement model, we have 



(44) 



5.1 Inference of gene regulatory network with four genes 

In the simulations, we consider a network with four genes. 
The true gene regulatory coefficients matrix is given by 



A = 



3 0 0 -4.5 
-2.9 0 5 0 
-6 4 0 0 

0-520 



(45) 



To compare the EM algorithm with the proposed rEM 
algorithm, the simulation was conduced ten times. In 
each time, the initial value oiA(0) is randomly generated 
from a Gaussian distribution with mean 0 and variance 
2. The EM, rEM, and rEMw, as well as the basis pursuit 
de-noising dynamic filtering (BPDN-DF) method and the 
li optimization method, are tested. Here, rEM^ denotes 
the version of the rEM algorithm with the iterative re- 
weighted thresholding discussed in Section 4.2. 



As a performance metrics, the receiver operating char- 
acteristic (ROC) curve is frequently used. However, for 
this specific example, with the increasing of the false- 
positive rate, the true-positive rate given by rEM and 
EM is always high (close to 1) which makes the dis- 
tinguishment of the performance of rEM algorithm and 
EM algorithm difficult. Hence, the root mean-squared 
error (RMSE) and the sparsity factor (SF) are used in this 
section. The RMSE is defined by 



RMSE: 



N N 



(46) 



i=l J=l 



where ^ denotes the estimated A. The SF is given by 

SF = — , (47) 
0 

where 0o and (j) are the number of zero values of the esti- 
mated parameter and the number of zero values of true 
parameter, respectively. It can be seen that the estimation 
is over sparse if the sparsity factor is greater than 1. 

In addition, to test the effectiveness of the proposed 
method at finding the support of the unknown param- 
eters, the number of matched elements is used and can 
be obtained by the following procedures: (1) Compute 
the support of A using the true parameters (denoted by 
As) and the support of A using the estimated parameters 
(denoted by As). Note that we assign [As]ij = 1 ifAij ^ 0 
and [^5]^y= OifAij = 0. Similarly, we assign [^5]^y= 1 if 
Aij 7^ 0 and [As]ij = 0 ifAij = 0. (2) Compute the num- 
ber of zero elements of As — As as the matched elements. 
It is easy to see that it is effective at finding the support 
of the unknown parameters when the number of matched 
elements is large. 

5. 7. 7 The effect of different X 

The performance of rEM using different X (10, 5, 1, 
0.5,0.1) is compared with the EM algorithm and the 
rEMi4/. The RMSE and SF are shown in Figures 1 and 2, 
respectively. The RMSE does not increase monotonously 
with the decreasing of parameter X, It can be seen that the 
rEM with X = 5 has better performance than that using 
other X. In addition, the rEM with all X except X = 10 out- 
performs the EM algorithm. It provides smaller RMSE and 
sparser result. The rEM^ provides the smallest RMSE and 
sparsest parameter estimation. The number of matched 
elements of test algorithms with different X is given in 
Figure 3. It can also be seen that rEM^^ provides more 
matched elements than the EM algorithm. 

5. 7.2 The effect of noise 

Two different cases are tested. In the first case, the covari- 
ance of the process noise and measurement noise are 
chosen to be 0.01. In the second case, they are chosen 
to be 0.1. The performance of two test cases is shown in 
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Figures 4, 5, 6. It can be seen that the RMSE of rEMn/ with 
U,R = 0.01/ is smaller than that with U,R = 0.1/. In 
addition, the rEM^ with U^R = 0.01 provides a larger 
number of matched elements than that with U,R = 0.1 
as shown in Figure 6. Hence, the estimation accuracy 
is better when the process noise and measure noise are 
small. 

5. 1.3 The effect of the number of observations 

In order to test the effect of the number of observa- 
tions, the rEMvi/ algorithm with 10 and 20 observations are 
tested. The simulation results are shown in Figures 7, 8, 9. 
It can be seen that rEM^ with more observations gives 
less RMSE. In addition, as shown in Figure 9, rEM^ with 
more observations gives slightly better result for finding 
the support of the unknown parameters. 



5.7.4 The effect of K 

In order to test the performance of /c, the rEM^ algo- 
rithm with different k (0,-1,-3) is tested. The performance 
results are shown in Figures 10, 11, 12. Note that the 
cubature rule corresponds to k = 0, and the unscented 
transformation corresponds to k = —1. Roughly speak- 
ing, the performance of tEM^ with different k is close. 
Specifically, it can be seen that the RMSE of rEMn^ using 
K = —1 and rEMvi/ using /c = — 3 is less than that of rEM^ 
using K = 0. The sparsity factor of rEMvi/ using k = —1 
is more close to 1 than that of rEMvi/ using k = —3 and 
K = 0. Moreover, the number of matched elements of 
rEMvi/ using k = —1 is more than that of rEMvi/ using 
K = —3 and /c = 0. Hence, the performance of rEM using 
K = —lis the best in this case. 
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5.1.5 Effect ofsparsity level 

The performance comparison of the tEMw and the con- 
ventional EM with different sparsity levels of A is shown 
in Figures 13, 14, 15. In this subsection, another A which 
is denser than the previously used A is given by 



A = 



3 -10 -4.5 
-2.9 0 5 1 
-6 4 0 -1 
1-520 



(48) 



Note that '(Denser)' is used to denote the result using 
A shown in Equation 48. It can be seen that the RMSE of 
rEMvi/ (Denser) is comparable to that of the EM(Denser). 
However, the sparsity factor of rEMn; (Denser) is closer 
to 1 than that of the EM(Denser) which means that 



the rEMvi/ (Denser) is better. In addition, the number of 
matched elements of the rEMvi/ (Denser) is large than that 
of the EM(Denser), which means that the rEMvi/ (Denser) 
is better than the EM(Denser) in finding the support 
of the unknown parameters. The performance of the 
rEMvi/ (Denser), however, is worse than that of the rEMvi/ 
in terms of the improvement of the RMSE. Hence, the 
rEM algorithm may have close performance with the EM 
algorithm when the sparsity is not obvious. 

5.1.6 Comparison with i i optimization 

We compare the proposed rEM algorithm and the li 
optimization-based method, as well as the conventional 
EM algorithm. The l\ optimization is a popular approach 
to obtain the sparse solution. For the problem under 
consideration, it obtains an estimate of 0 by solving the 
following optimization problem: 
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^ = arg min^[y^-^(^)^(;^^_i)]^[y;,-^(^)^(;^/,_i)] +X||^||i, 

(49) 

where X\ = and ^z^+i = ^(^/:). 

We also compare the l\ optimization method with the 
proposed rEMw algorithm, and the results are shown in 
Figures 16, 17, 18. Seven different A. (5, 2, 1, 0.5, 0.1, 0.05, 
and 0.01) are used in the t\ optimization method. The 
RMSE does not decrease monotonously with the decreas- 
ing of the parameter A. Among all tested values, the l\ 
optimization method with A. = 0.1 gives the smallest 
RMSE. However, the sparsity factor of the t\ optimization 
with A. = 0.1 is far from the ideal value 1. The t\ optimiza- 
tion with A. = 5 gives the best support detection as shown 



in Figure 18. The re-weighted t\ optimization algorithm is 
also used in the simulation. However, all l\ optimization- 
based methods cannot achieve better performance than 
that of using the rEMvi/. 

5. 7.7 Comparison with BPDN-DF 

To solve the problem using BPDN-DF, the model in (41) 
and (44) are modified as 



h =f(xk-i) 
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and 
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(50) 
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(52) 



where A, = [Ai, • • • , A. 20] with A./ = 0, / = 1, 2, 3, 4 since 
our objective is to explore the sparsity of the parameter 
0, The exact same initial values used in testing EM and 
rEM are used to test the performance of the BPDN-DR 
The simulation results are shown in Figures 19, 20, 21. It 
can be seen that although the sparsity factor of BPDN- 
DF is comparable with that of the rEMn/, the RMSE of the 
BPDN-DF is much larger than that of the rEMw In addi- 
tion, as shown in Figure 21, the rEMw is better than the 
BPDN-DF in finding the support of the unknown param- 
eters. The possible reason is that the BPDN-DF does not 
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consider the noise in the dynamic system, and the mea- 
surement matrix H/^ is an ill-conditioned matrix. In the 
simulation, Xj = 0.1,; = 5, • • • , 20. Based on our tests by 
using other values of A, there is no obvious improvement. 

5.2 Inference of gene regulatory network with eight 
genes 

In this section, we test the proposed algorithm using 
a larger gene regulatory network which includes eight 
genes; the performances of the EM, the rEM, the rEM^, 
the li optimization method, and BPDN-DF are given. 
Forty data points are collected to infer the structure of 
the network. The system noise and measurement noise 
are assumed to be Gaussian-distributed with means 0 and 
covariances = O.Ol/g and /?/^ = O.Ol/g, respectively. 
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For testing, each coefficient in A is initialized from a Gaus- 
sian distribution with mean 0 and variance 1. The system 
state is initialized using the first measurement. 

The metric used to evaluate the inferred GRN is the 
ROC curve, in which the true-positive rate (TPR) and the 
false-positive rate (FPR) are involved. They are given by 

TP# , , 

TPR = 3::r- =rT-, (54) 



TP# + FN# 



FPR = 



FP# 



(55) 

FP# + TN# 

where the number of true positives (TP#) denotes the 
number of links correctly predicted by the inference algo- 
rithm, the number of false positives (FP#) denotes the 
number of incorrectly predicted links. The number of 
true negatives (TN#) denotes the number of correctly 
predicted non-links, and the number of false negatives 
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Figure 21 The number of matched elements of BPDN-DF and 



(FN#) denotes the number of missed links by the inference 
algorithm [15]. 

The ROC curves of the EM, the rEM, and the tEM^ are 
compared in Figure 22. The rEM with different X is tested. 
In Figure 22, the curves of rEM with four typical values of 
k are shown. There is no obvious improvement by using 
other A. From the figure, it can be seen that the rEM^ 
performs better than the rEM and the convectional EM 
algorithms. 

In addition, the sparse solution is obtained by using rEM 
and tEMw while it cannot be obtained by using the EM 
algorithm. The sparsity factor of rEM and vEM^ is shown 
in Figure 23; the sparsity of the solution given by rEM^ 
is closer to the ground truth than that given by the EM 
algorithm. 
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Figure 23 Sparsity factor of the rEM and the rEIVl„,. 
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In Figure 24, the ROC curves of the rEM^, li optimiza- 
tion method, and BPDN-DF are compared. Similarly, the 
li optimization method with different k is tested, and only 
four curves are shown in the figure. By using other val- 
ues, there is no obvious improvement. The BPDN-DF with 
different X has no obvious difference in the test. From 
Figure 24, it can be seen that the rEMvi/ performs much 
better than the li optimization method and BPDN-DF 
algorithm. Hence, the sparsity factor of €i optimization 
method and BPDN-DF is not shown. 

5.3 Inference of gene regulatory network from malaria 
expression data 

The dataset with the first six gene expression data of 
malaria is given in reference [37] and is used in this 
section. The initial covariance for the algorithm is Pq = 
0.5/. The process noise and measurement noise are 
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assumed to be Gaussian noise with zero mean and covari- 
ance 0.3^/ and 0.4^/, respectively. In the following, we 
show the inference results of the parameter and the state 
estimation provided by the unscented Kalman filter (UKF) 
based on the model using the inferred parameters. 
The inferred A by the EM algorithm is 



The inferred A by the rEM^ is 
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(56) 



The inferred A by the rEM with A. = 1 is 
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The state estimation provided by the UKF based on 
the model using the inferred parameters of the EM, the 
rEM, the rEM^, and the true gene expression is shown 
in Figure 25. The left top and right top panels are the 
expression of the first gene and the second gene, respec- 
tively. The left center and right center panels are the 
expression of the third gene and the fourth gene, respec- 
tively. The left bottom and right bottom panels are the 
expression of the fifth gene and the sixth gene, respec- 
tively. It can be seen that the estimate gene expression 
using the UKF and parameters given by the EM, the 
rEM, and the rEM^ is close to the true gene expression 
data. In addition. The rEMvi/ algorithm provide sparser 
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solution than the rEM algorithm. Both the rEM and the 
rEMu/ algorithms give sparser solutions than the EM algo- 
rithm which validates the effectiveness of the proposed 
method. 

6 Conclusions 

In this paper, we have considered the problem of sparse 
parameter estimation in a general nonlinear dynamic sys- 
tem, and we have proposed an approximate MAP-EM 
solution, called the rEM algorithm. The expectation step 
involves the forward Gaussian approximation filtering 
and the backward Gaussian approximation smoothing. 
The maximization step employs a re-weighted iterative 
thresholding method. We have provided examples of the 
inference of gene regulatory network based on expres- 
sion data. Comparisons with the traditional EM algo- 
rithm as well as with the existing approach to solving 
sparse problems such as the li optimization and the 
BPDN-DF show that the proposed rEM algorithm pro- 
vides both more accurate estimation result and sparser 
solutions. 
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